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Abstract 

We review the properties of the quantum relative entropy function and dis- 
cuss its application to problems of classical and quantum information transfer 
and to quantum data compression. We then outline further uses of relative 
entropy to quantify quantum entanglement and analyze its manipulation. 

1 Quantum relative entropy 

In this paper we discuss several uses of the quantum relative entropy function 
in quantum information theory. Relative entropy methods have a number of 
advantages. First of all, the relative entropy functional satisfies some strong 
identities and inequalities, providing a basis for good theorems. Secondly, 
the relative entropy has a natural interpretation in terms of the statistical 
distinguishability of quantum states; closely related to this is the picture of 
relative entropy as a "distance" measure between density operators. These 
interpretations of the relative entropy give insight about the meaning of the 
mathematical constructions that use it. Finally, relative entropy has found 
a wide variety of applications in quantum information theory. 
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The usefulness of relative entropy in quantum information theory should 
come as no surprise, since the classical relative entropy has shown its power 
as a unifying concept in classical information theory [1]. Indeed, some of 
the results we will describe have close analogues in the classical domain. 
Nevertheless, the quantum relative entropy can provide insights in contexts 
(such as the quantification of quantum entanglement) that have no parallel 
in classical ideas of information. 

Let Q be a quantum system described by a Hilbert space TC. (Throughout 
this paper, we will restrict our attention to systems with Hilbert spaces 
having a finite number of dimensions.) A pure state of Q can be described 
by a normalized vector \ip) in TC, but a general (mixed) state requires a 
density operator p, which is a positive semi-definite operator on TC with unit 
trace. For the pure state the density operator p is simply the projection 
operator \if))(ip\; otherwise, p is a convex combination of projections. The 
entropy S(p) is defined to be 

S(p) = -Trplogp. (1) 

The entropy is non-negative and equals zero if and only if p is a pure state. 
(By "log" we will mean a logarithm with base 2.) 

Closely related to the entropy of a state is the relative entropy of a pair 
of states. Let p and a be density operators, and define the quantum relative 
entropy S (p\\<j) to be 

S (p| \a) = Tr p log p — Tr p log a. (2) 

(We read this as "the relative entropy of p with respect to <r" .) This function 
has a number of useful properties: @ 

1. S (p| |cr) > 0, with equality if and only if p = a. 

2. S (p\\a) < oo if and only if supp p C supper. (Here "supp p" is the 
subspace spanned by eigenvectors of p with non-zero eigenvalues.) 

3. The relative entropy is continuous where it is not infinite. 

4. The relative entropy is jointly convex in its arguments ||. That is, if 
Pi; P2, °\ and <7 2 are density operators, and pi and p 2 are non- negative 
numbers that sum to unity (i.e., probabilities), then 

S (p\\a) < pxS (pi|K) +p 2 S (P2IM (3) 
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where p = p\P\ + P2P2 and a = p\(J\ + P2&2- Joint convexity automati- 
cally implies convexity in each argument, so that (for example) 

S 01k) < PiS{p x \\a) + p 2 S (pzWcr) . (4) 

The properties, especially property (1), motivate us to think of the relative 
entropy as a kind of "distance" between density operators. The relative 
entropy, which is not symmetric and which lacks a triangle inequality, is 
not technically a metric; but it is a positive definite directed measure of the 
separation of two density operators. 

Suppose the density operator p k occurs with probability p k , yielding an 
average state p = y^ftkPk, and suppose a is some other density operator. 

k 

Then 

^2pk<S (pk\W) = ^Pk (Trp fc logp fc - Trpfeloga) 

k k 

= ^2p k (Tr p k log p k - Tr p k log p + Tr p k log p - Tr p k log a) 

k 

= ^Pk (Trp fc logp fc - Trp fc logp) + Trplogp - Trplogcr 

k 

Y,PkS{p k \\a) = J2Pk S (Pk\\p) +S{p\\a) . (5) 

k k 

Equation |5| is known as Donald's identity. 

The classical relative entropy of two probability distributions is related to 
the probability of distinguishing the two distributions after a large but finite 
number of independent samples. This is called Sanov's theorem and this 
result has quantum analogue 0. Suppose p and a are two possible states 
of the quantum system Q, and suppose we are provided with iV identically 
prepared copies of Q. A measurement is made to determine whether the 
prepared state is p, and the probability Pn that the state a passes this 
test — in other words, is confused with p — is 



P N w 2- NS ^ (6) 

as iV — > 00. (We have assumed that the measurement made is an optimal 
one for the purpose, and it is possible to show that an asymptotically optimal 
measurement strategy can be found that depends on p but not a.) 

The quantum version of Sanov's theorem tells us that the quantum rela- 
tive entropy governs the asymptotic distinguishability of one quantum state 
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from another by means of measurements. This further supports the view of 
S (-||-) as a measure of "distance"; two states are "close" if they are difficult 
to distinguish, but "far apart" if the probability of confusing them is small. 

The remainder of this paper is organized as follows. Sections 2-5 apply 
relative entropy methods to the problem of sending classical information by 
means of a (possibly noisy) quantum channel. Sections 6-7 consider the 
transmission and compression of quantum information. Sections 8-9 then 
apply relative entropy methods to the discussion of quantum entanglement 
and its manipulation by local operations and classical communication. We 
conclude with a few remarks in Section 10. 

2 Classical communication via quantum chan- 
nels 

One of the oldest problems in quantum information theory is that of sending 
classical information via quantum channels. A sender ("Alice") wishes to 
transmit classical information to a receiver ( "Bob" ) using a quantum system 
as a communication channel. Alice will represent the message a, which occurs 
with probability p a , by preparing the channel in the "signal state" represented 
by the density operator p a . The average state of the channel will thus be p = 
y^ftaPa- Bob will attempt to recover the message by making a measurement 

a 

of some "decoding observable" on the channel system. 

The states p a should be understood here as the "output" states of the 
channel, the states that Bob will attempt to distinguish in his measurement. 
In other words, the states p a already include the effects of the dynamical 
evolution of the channel (including noise) on its way from sender to receiver. 
The dynamics of the channel will be described by a trace-preserving, com- 
pletely positive map S on density operators || . The effect of £ is simply to 
restrict the set of output channel states that Alice can arrange for Bob to 
receive. If T> is the set of all density operators, then Alice's efforts can only 
produce output states in the set A = S{V), a convex, compact set of density 
operators. 

Bob's decoding observable is represented by a set of positive operators 
Eb such that V^-E& = 1. If Bob makes his measurement on the state p a , then 

b 
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the conditional probability of measurement outcome b is 



P(b\a)=Trp a E b . (7) 

This yields a joint distribution over Alice's input messages a and Bob's de- 
coded messages b: 

P(a,b)= Pa P(b\a). (8) 

Once a joint probability distribution exists between the input and output 
messages (random variables A and B, respectively), the information transfer 
can be analyzed by classical information theory. The information obtained 
by Bob is given by the mutual information I (A : B): 

I(A : B) = H{A) + H(B) — H(A, B) (9) 

where H is the Shannon entropy function 

H{X) = -Y i p{x)\ogp{x). (10) 

X 

Shannon showed that, if the channel is used many times with suitable error- 
correcting codes, then any amount of information up to I (A : B) bits (per use 
of the channel) can be sent from Alice to Bob with arbitrarily low probability 
of error [|TJ. The classical capacity of the channel is C = max I (A : B), where 
the maximum is taken over all input probability distributions. C is thus the 
maximum amount of information that may be reliably conveyed per use of 
the channel. 

In the quantum mechanical situation, for a given ensemble of signal states 
p a , Bob has many different choices for his decoding observable. Unless the 
signal states happen to be orthogonal, no choice of observable will allow Bob 
to distinguish perfectly between them. A theorem stated by Gordon and 
Levitin|| and first proved by Holevo|| states that the amount of information 
accessible to Bob is limited by I (A : B) < x, where 

X = S( P )-J2PaS(Pa). (11) 

a 

The quantity x is non-negative, since the entropy S is concave. 

More recently, Holevo [|K|] and Schumacher and Westmoreland have 



shown that this upper bound on I(A : B) is asymptotically achievable. If 
Alice uses the same channel many times and prepares long codewords of 
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signal states, and Bob uses an entangled decoding observable to distinguish 
these codewords, then Alice can convey to Bob up to x bits of information 
per use of the channel, with arbitrarily low probability of error. (This fact 
was established for pure state signals p a = \tl} a )(i>a\ in fll2] . In this case, 
X = S(p).) 

The Holevo bound \ can be expressed in terms of the relative entropy: 
X = -Trplogp + ^p a Trp a logp a 

a 

= J2p* (Trp a logp a - Trp a logp) 

a 

X = £p a <S(p a ||p). (12) 

a 

In geometric terms, x is the average relative entropy "directed distance" from 
the average state p to the members of the signal ensemble. 

Donald's identity (Equation |5|) has a particularly simple form in terms of 
X- Given an ensemble and an additional state a, 

y £PaS{Pa\\(T)=X + S{p\\(T). (13) 

a 

This implies, among other things, that 

X<Y.PaS(pa\W) (14) 

a 

with equality if and only if a — p, the ensemble average state. 



3 Thermodynamic cost of communication 

In this section and the next, we focus on the transfer of classical information 
by means of a quantum channel. 

Imagine a student who attends college far from home |13|]. Naturally, the 
student's family wants to know that the student is passing his classes, and 
so they want the student to report to them frequently over the telephone. 
But the student is poor and cannot affort very many long-distance telephone 
calls. So they make the following arrangement: each evening at the same 
time, the poor student will call home only if he is failing one or more of this 
classes. Otherwise, he will save the phone charges by not calling home. 
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Every evening that the poor student does not call, therefore, the family 
is receiving a message via the telephone that his grades are good. (That the 
telephone is being used for this message can be seen from the fact that, if 
the phone lines are knocked out for some reason, the family can no longer 
make any inference from the absence of a phone call.) 

For simplicity, imagine that the student's grades on successive days are 
independent and that the probability that the student will be failing on 
a given evening is p. Then the information conveyed each evening by the 
presence or absence of a phone call is 

#(p) = -plogp- (1 -p)log(l -p). (15) 

The cost of making a phone call is c, while not making a phone call is free. 
Thus, the student's average phone charge is cp per evening. The number of 
bits of information per unit cost is thus 

^ = l(-togp-(l-l)log(l-p)). (16) 

If the poor student is very successful in his studies, so that p — > 0, then this 
ratio becomes unboundedly large, even though both H{p) — > and cp — > 0. 
That is, the student is able to send an arbitrarily large number of bits per 
unit cost. There is no irreducible cost for sending one bit of information over 
the telephone. 

The key idea in the story of the poor student is that one possible signal- 
no phone call at all — has no cost to the student. The student can exploit this 
fact to use the channel in a cost-effective way, by using the zero-cost signal 
almost all of the time. 

Instead of a poor student using a telephone, we can consider an analo- 
gous quantum mechanical problem. Suppose that a sender can manipulate 
a quantum channel to produce (for the receiver) one of two possible states, 
Po or p\. The state po can be produced at "zero cost", while the state p\ 
costs a finite amount C\ > to produce. In the signal ensemble, the signal 
state pi is used with probability rj and p with probability 1 — 77, leading to 
an average state 

P= (! ~v)Po + VPi- ( 17 ) 
The average cost of creating a signal is thus c = r\c\. For this ensemble, 

X = (l-v)S(p \\p) + vS(pi\\p). (18) 
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As discussed in the previous section, x is an asymptotically achievable upper 
bound for the information transfered by the channel. 

An upper bound for x can be obtained from Donald's identity. Letting 
Po be the "additional" state, 

X<(l-v)S(p \ |po) + VS (Pi | |p ) = VS (Pi \\po). (19) 
Combining this with a simple lower bound, we obtain 

riS{ Pl \\p)<x<r]S{pi\\p^. (20) 

If we divide x by the average cost, we find an asymptotically achievable upper 
bound for the number of bits sent through the channel per unit cost. That 
is, 

^ < -S ( Pl \\p ) . (21) 

C Ci 

Furthermore, equality holds in the limit that rj — > 0. Thus, 

sup^ = -S(pi||p ). (22) 

C Ci 

In short, the relative entropy "distance" between the signal state p 1 and 
the "zero cost" signal po gives the largest possible number of bits per unit 
cost that may be sent through the channel — the "cost effectiveness" of the 
channel. If the state po is a pure state, or if we can find a usable signal state 
pi whose support is not contained in the support of p , then S (pi]|po) = 00 
and the cost effectiveness of the channel goes to infinity as rj — > 0. (This is 
parallel to the situation of the poor student, who can make the ratio of "bits 
transmitted" to "average cost" arbitrarily large.) 

What if there are many possible signal states pi, p2, etc., with positive 
costs Ci, c 2 , and so on? If we assign the probability i]q k to p k for k — 1, 2, . . . 
(where Y^?* = 1), and use p with probability 1 — 77, then we obtain 

k 

vJ2<lk<S (pk\\p) < X < vJ2<lk<S (pk\\po) ■ (23) 

k k 

The average cost of the channel is c = rj^q k c k . This means that 

k 

X < EkQkS (Pfc| |po) ^ 
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We now note the following fact about real numbers. Suppose a n , b n > 
for all n. Then 

%<max^. (25) 

This can be proven by letting R = max(a n /6 n ) and pointing out that a n < 
Rb n for all n. Then 



n n 



In our context, this implies that 



< R. 



Efc qkS (p k \\po) < max (pfc||po) ^ 26 ^ 

and thus 

^<max^H (27) 

c * c fc 

By using only the "most efficient state" (for which the maximum on the right- 
hand side is achieved) and adopting the "poor student" strategy of r\ — > 0, 
we can show that 

sup — = max . 28 

c k c k 

These general considerations of an abstract "cost" of creating various 
signals have an especially elegant development if we consider the thermody- 
namic cost of using the channel. The thermodynamic entropy Sg is related 
to the information-theoretic entropy S(p) of the state p of the system by 

S e = A; In 2 S(p). (29) 

The constant k is Boltzmann's constant. If our system has a Hamiltonian 
operator H, then the thermodynamic energy E of the state is the expectation 
of the Hamiltonian: 

E = (H) = Tr pH. (30) 

Let us suppose that we have access to a thermal reservoir at temperature T. 
Then the "zero cost" state p is the thermal equilibrium state 

Po = |e-^, (31) 
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where (3 = 1/kT and Z = Tre _/3H . [Z is the partition function.) 

The free energy of the system in the presence of a thermal reservoir at 
temperature T is F = E — TSq. For the equilibrium state po, 

F = Trp H + kT\n2(-\ogZ-^Trp H\ 

= -kT In 2 log Z (32) 

The thermodynamic cost of the state pi is just the difference F\ — F between 
the free energies of p\ and the equilibrium state po- But this difference has 
a simple relation to the relative entropy. First, we note 

Trpilogpo = -logZ - fiTrpxH, (33) 



from which it follows that [fL4 



Fx-Fo = TrpiH + £;T In 2 Tr pi log p x + A;Tln2 logZ 

= kT In 2 (Tr p x log p 1 - Tr p x log p ) 
Fi-Fo = kT\n2S( Pl \\p ). (34) 

If we use the signal state pi with probability rj, then the average thermody- 
namic cost is / = "^(-Fi — -Fo)- The number of bits sent per unit free energy 
is therefore 

x %N _ 1 ( 35) 

f V f ~ kT\n2' {6b) 
The same bound holds for all choices of the state pi, and therefore for all 
ensembles of signal states. 

We can approach this upper bound if we make 77 small, so that 

SUP 7 = ^ln2 (36) 

In short, for any coding and decoding scheme that makes use of the quantum 
channel, the maximum number of bits that can be sent per unit free energy 
is just (fcTln2) -1 . Phrased another way, the minimum free energy cost per 
bit is A;Tln2. 

This analysis can shed some light on Landauer's principle |]15|] , which 
states that the minimum thermodynamic cost of information erasure is fcTln 2 
per bit. From this point of view, information erasure is simply information 
transmission into the environment, which requires the expenditure of an ir- 
reducible amount of free energy. 
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4 Optimal signal ensembles 



Now we consider ^-maximizing ensembles of states from a given set A of 
available (output) states, without regard to the "cost" of each state. Our 
discussion in Section 2 tells us that the x-maximizing ensemble is the one 
to use if we wish to maximize the classical information transfer from Alice 
to Bob via the quantum channel. Call an ensemble that maximizes \ an 
"optimal" signal ensemble, and denote the maximum value of % by x* ■ (The 
results of this section are developed in more detail in fT6|| .) 

The first question is, of course, whether an optimal ensemble exists. It 
is conceivable that, though there is a least upper bound x* to the possible 
values of Xi no particular ensemble in A achieves it. (This would be similar 
to the results in the last section, in which the optimal cost effectiveness of 
the channel is only achieved in a limit.) However, an optimal ensemble does 
exist. Uhlmann |l7j has proven a result that goes most of the way. Suppose 
our underlying Hilbert space 7i has dimension d and the set A of available 
states is convex and compact. Then given a fixed average state p, there exists 
an ensemble of at most d 2 signal states p a that achieves the maximum value 
of x f° r that particular p. The problem we are considering is to maximize 
X over all choices of p in A. Since Uhlmann has shown that each p-fixed 
optimal ensemble need involve no more than d 2 elements, we only need to 
maximize x over ensembles that contain d 2 or fewer members. The set of 
such ensembles is compact and x is a continuous function on this set, so x 
achieves its maximum value x* f° r some ensemble with at most d 2 elements. 

Suppose that the state p a occurs with probability p a in some ensemble, 
leading to the average state p and a Holevo quantity x- We will now consider 
how x changes if we modify the ensemble slightly. In the modified ensemble, a 
new state ui occurs with probability rj and the state p a occurs with probability 
(1 — i])p a . For the modified ensemble, 

p' = r ] oo+(l-r ] )p ^ (37) 

X ' = r^M|pO + (l-^)E^Mp')- (38) 

a 

We can apply Donald's identity to these ensembles in two different ways. 
First, we can take the original optimal ensemble and treat p' as the other 
state (a in Eq. ^|), obtaining: 

J2PaS(Pa\\p')=X + S(p\\p'). (39) 
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Substituting this expression into the expression for x' yields: 

X ' = V S(uj\\p') + (l~ V )(x + S(p\\p')) 
Ax = x' ~ X 

= r ] (S(u;\\p')-x) + vS(p\\p , ) (40) 

Our second application of Donald's identity is to the modified ensemble, 
taking the original average state p to play the role of the other state: 

V S (iu\\p) + (1 - V ) X = X' + S(p'\\p) (41) 
A X = V (S(uj\\p)-x)-S(p'\\ P ). (42) 

Since the relative entropy is never negative, we can conclude that 

V (S(lu\\ P ')-x)<A X <V (S M |p) - X) ■ (43) 

This gives upper and lower bounds for the change in x if we m i x i n an 
additional state to to our original ensemble. The bounds are "tight", since 
as rj — > 0, S (u)\\p') — > <S 

Very similar bounds for Ax apply if we make more elaborate modifications 
of our original ensemble, involving more than one additional signal state. 



This is described in 16 



We say that an ensemble has the maximal distance property if and only 
if, for any u> in A, 

S{u\\p)< X , (44) 

where p is the average state and x is the Holevo quantity for the ensemble. 
This property gives an interesting characterization of optimal ensembles: 

Theorem: An ensemble is optimal if and only if it has the max- 
imum distance property. 

We give the essential ideas of the proof here; further details can be found in 



Suppose our ensemble has the maximum distance property. Then, if we 
add the state ui with probability rj, the change Ax satisfies 

A X <v(S(cu\\p)-x) <0. (45) 

In other words, we cannot increase X by mixing in an additional state. Con- 
sideration of more general changes to the ensemble leads to the same conclu- 
sion that A% < 0. Thus, the ensemble must be optimal, and x = X* ■ 



12 



Conversely, suppose that the ensemble is optimal (with X — X*)- Could 
there be a state uj in A such that S (uj\\p) > x*? If there were such an uj, 
then by choosing rj small enough we could make S (ou\\p') > x*, and so 

A X > v (S(u\\p')- X *) >0. (46) 

But this contradicts the fact that, if the original ensemble is optimal, Ax < 
for any change in the ensemble. Thus, no such uj exists and the optimal 
ensemble satisfies the maximal distance property. 

Two corollaries follow immediately from this theorem. First, we note that 
the support of the average state p of an optimal ensemble must contain the 
support of every state uj in A. Otherwise, the relative entropy S (uj\\p) = oo, 
contradicting the maximal distance property. The fact that p has the largest 
support possible could be called the maximal support property of an optimal 
ensemble. 

Second, we recall that x* is J us t the average relative entropy distance of 
the members of the optimal ensemble from the average state p: 

X* = Y,P* S (Pa\\p) ■ 
a 

Since S (p a \ \p) < x* for each a, it follows that whenever p a > we must have 

S(p a \\p)= X *. (47) 

We might call this the equal distance property of an optimal ensemble. 

We can now give an explicit formula for x* that does not optimize over 
ensembles, but only over states in A. From Equation IT3, for any state a, 



X<Y,P a S (Pa\W) (48) 

a 

and thus 

X < max5 (a; | |cr) (49) 

where the maximum is taken over all uj in A. We apply this inequality to 
the optimal ensemble, finding the lowest such upper bound for x*'- 

X* < min (m&xS (cu\\a)^ . (50) 

But since the optimal ensemble has the maximal distance property, we know 
that 

X* = max S (u \\p) (51) 
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for the optimal average state p. Therefore, 

X* = min ^maxiS (u>\ \a)j . (52) 

5 Additivity for quantum channels 

The quantity x* is an asymptotically achievable upper bound to the amount 
of classical information that can be sent using available states of the channel 
system Q. It is therefore tempting to identify x* as t ne classical capacity 
of the quantum channel. But there is a subtlety here, which involves an 
important unsolved problem of quantum information theory. 

Specifically, suppose that two quantum systems A and B are available 
for use as communication channels. The two systems evolve independently 
according the product map £ A (g> 8 B . Each system can be considered as a 
separate channel, or the joint system AB can be analyzed as a single channel. 
It is not known whether the following holds in general: 

X AB *=X A * + X B *- (53) 

Since separate signal ensembles for A and B can be combined into a product 
ensemble for AB, it is clear that x AB * — X A * + X B * ■ However, the joint 
system AB also has other possible signal ensembles that use entangled input 
states and that might perhaps have a Holevo bound for the output states 
greater than x A * + X B * ■ 



Equation |53] is the "additivity conjecture" for the classical capacity of a 
quantum channel. If the conjecture is false, then the use of entangled input 
states would sometimes increase the amount of classical information that can 
be sent over two or more independent channels. The classical capacity of a 
channel (which is defined asymptotically, using many instances of the same 
channel) would thus be greater than x* f° r a single instance of a channel. 
On the other hand, if the conjecture holds, then x* is the classical capacity 
of the quantum channel. 

Numerical calculations to date [18] support the additivity conjecture for 



a variety of channels. Recent work [|T^, gives strong evidence that Equa- 
tion |53| holds for various special cases, including channels described by unital 
maps. We present here another partial result: x* is additive for any "half- 
noisy" channel, that is, a dual channel that is represented by an map of the 
form I A (g) £ B , where X A is the identity map on A. 
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Suppose the joint system AB evolves according to the map X A ® £ B , and 
let p A and p B be the average output states of optimal signal ensembles for A 
and B individually. We will show that the product ensemble (with average 
state p A ® p B ) is optimal by showing that this ensemble has the maximal 
distance property That is, suppose we have another, possibly entangled 
input state of AB that leads to the output state uj ab . Our aim is to prove 
that S (uj AB \\p A <g> p B ) < x A * + X B *- From the definition of S (-||-) we can 
show that 



u AB \\p A 



P 



-S (u AB ) - Tr u A log p A - Tr u B log p B 
S (lu a ) + S (lu b ) - S (lu ab ) 
+ S(u A \\p A )+S(u B \\p B ). (54) 



(The right-hand expression has an interesting structure; S(u A ) + S(u B ) — 
S{ui AB ) is clearly analogous to the mutual information defined in Equation ^.) 
Since A evolves according to the identity map X A , it is easy to see that 
A - d — dim H A and 

1 



X 



P 



From this it follows that 



S (u A ) + S (u J 



\P 



log d = x 



A* 



(55) 



(56) 



for any uj a . This accounts for two of the terms on the right-hand side of 



Equation |54j. The remaining three terms require a more involved analysis. 
The final joint state u AB is a mixed state, but we can always introduce a 



third system C that "purifies" the state. That is, we can find 
that 



AB 



Tr 



c 



qABC\/qABC 



tt ABC ) such 



(57) 



IS 



Since the overall state of ABC is a pure state, S(u ) = S(u> ), where to 
the state obtained by partial trace over A and B. Furthermore, imagine that 
a complete measurement is made on A, with the outcome k occuring with 
probability p^. For a given measurement outcome k, the subsequent state of 
the remaining system BC will be Vt BC ^j. Letting 



to i 



B 



C 



Tr c 
Tr b 



(58) 
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we have that S(u B ) = S(u>%) for all k. Furthermore, by locality, 



b sr^ B 



to 

k 

c 



' k 



£Wk- (59) 



In other words, we have written both uj b and uj c as ensembles of states. 

We can apply this to get an upper bound on the remaining terms in 
Equation 

S(u B )-S(u AB )+S(u B \\p B ) 
= s(u,*)-5>S(u£) 

k 

-S(uj c )+j:p k S(^) + S(uj B \\p B ) 

k 

< X B + sL B \\p B ), (60) 



where x B is the Holevo quantity for the ensemble of ui B states. Donald's 
identity permits us to write 

8 (co B ) - S (u AB ) + S (u B \\p B ) = Y,PkS (co B \\p B ) . (61) 

k 

The B states ui B are all available output states of the B channel. These 
states are obtained by making a complete measurement on system A when 
the joint system AB is in the state u AB . But this state was obtained from 
some initial AB state and a dynamical map X A <g> £ B . This map commutes 
with the measurement operation on A alone, so we could equally well make 
the measurement before the action of X A ®£ B . The A-measurement outcome 
k would then determine the input state of B, which would evolve into uj b . 
Thus, for each k, uj b is a possible output of the £ B map. 

Since p B has the maximum distance property and the states uj b are avail- 
able outputs of the channel, S {uj B \\p B ^j < x B * f° r every k. Combining 
Equations and [51], we find the desired inequality: 

[u AB \\p A ®p B ) < X A * + X B *- (62) 



5(o 



This demonstrates that the product of optimal ensembles for A and B also 
has the maximum distance property for the possible outputs of the joint 
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channel, and so this product ensemble must be optimal. It follows that 
X AB * — X A * + X B * hi this case. 

Our result has been phrased for the case in which A undergoes "trivial" 
dynamics X A , but the proof also works without modification if the time 
evolution of A is unitary — that is, A experiences "distortion" but not "noise" . 
If only one of the two systems is noisy, then x* is additive. 

The additivity conjecture for \* is closely related to another additivity 
conjecture, the "minimum output entropy" conjecture ]19|, pOp . Suppose A 
and B are systems with independent evolution described by £ A ® £ B , and 
let p A B be an output state of the channel with minimal entropy S(p A B). Is 
p AB a product state p A ® p B l The answer is not known in general; but it is 
quite easy to show this in the half-noisy case that we consider here. 



6 Maximizing coherent information 



When we turn from the transmission of classical information to the transmis- 
sion of quantum information, it will be helpful to adopt an explicit description 
of the channel dynamics, instead of merely specifying the set of available 
output states A. Suppose the quantum system Q undergoes a dynamical 
evolution described by the map £. Since £ is a trace-preserving, completely 
positive map, we can always find a representation of £ as a unitary evolution 
of a larger system |J. In this representation, we imagine that an additonal 

6 E ), and that 



(63) 



"environment" system E is present, initially in a pure state 
Q and E interact via the unitary evolution operator U® E . That is, 



ffl = £{p L 



Tr 



,U QE (p c 



0^)0 



U Qm . 



For convenience, we denote an initial state of a system by the breve accent 
(as in p®), and omit this symbol for final states. 

The problem of sending quantum information through our channel can 
be viewed in one of two ways: 

1. An unknown pure quantum state of Q is to be transmitted. In this 
case, our criterion of success is the average fidelity F, defined as follows. 



Suppose the input state 
the output state pk- Then 



occurs with probability pk and leads to 



h 



Pk 



(64) 
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In general, F depends not only on the average input state pfi but also 
on the particular pure state input ensemble. 

2. A second "bystander" system R is present, and the joint system RQ is 

^RQ\ S y S t em R has "trivial" 



initially in a pure entangled state 
dynamics described by the identity map X, so that the joint system 
evolves according to X® £ , yielding a final state p RQ . Success is deter- 
mined in this case by the entanglement fidelity F e , defined by 

^RQ p RQ q,RQ\ ( 65 ) 



It turns out, surprisingly, that F e is only dependent on £ and the input 
state of Q alone. That is, F e is an "intrinsic" property of Q and its 
dynamics, [p^] 

These two pictures of quantum information transfer are essentially equiva- 
lent, since F e approaches unity if and only if F approaches unity for every 
ensemble with the same average input state pn . For now we adopt the second 
point of view, in which the transfer of quantum information is essentially the 
transfer of quantum entanglement (with the bystander system R) through 
the channel. 

The quantum capacity of a channel should be defined as the amount of 
entanglement that can be transmitted through the channel with F e — > 1, 
if we allow ourselves to use the channel many times and employ quantum 



error correction schemes p3| . At present it is not known how to calculate 
this asymptotic capacity of the channel in terms of the properties of a single 
instance of the channel. 

Nevertheless, we can identify some quantities that are useful in describing 



the quantum information conveyed by the channel ||24j| . A key quantity is 
the coherent information I®, defined by 

IQ = S (p«) - S (p RQ ) . (66) 

This quantity is a measure of the final entanglement between R and Q. (The 
initial entanglement is measured by the entropy S(p®) of the initial state of 
Q, which of course equals S(p R ). See Section 7 below.) If we adopt a unitary 
representation for £, then the overall system RQE including the environment 
remains in a pure state from beginning to end, and so S(p RQ ) = S(p E ). Thus, 

I® = S (fft) - S ( P E ) . (67) 
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Despite the apparent dependence of I Q on the systems R and E, it is in 
fact a function only of the map £ and the initial state pr of Q. Like the 
entanglement fidelity F e , it is an "intrinsic" characteristic of the channel 
system Q and its dynamics. 

It can be shown that the coherent information I® does not increase if 
the map £ is followed by a second independent map £', giving an overall 
dynamics described by £' o £ . That is, the coherent information cannot 
be increased by any "quantum data processing" on the channel outputs. 
The coherent information is also closely related to quantum error correction. 
Perfect quantum error correction — resulting in F e = 1 for the final state — 
is possible if and only if the channel loses no coherent information, so that 
I Q = S(p Q ). These and other properties lead us to consider I Q as a good 
measure of the quantum information that is transmitted through the channel 

The coherent information has an intriguing relation to the Holevo quan- 
tity Xi an d thus to classical information transfer (and to relative entropy) 
|25|. Suppose we describe that the input state p® by an ensemble of pure 



states 



5>l$}<#- (68) 



We adopt a unitary representation for the evolution and note that the initial 
pure state (g> 6 E "j evolves into a pure, possibly entangled state 
Thus, for each k the entropies of the final states of Q and E are equal: 



S (>?) = S (pS) . (69) 



It follows that 



IQ = s(p Q )-S( P E ) 

= s( p Q)-j:p k s(p Q k )-s ( P E ) +J2 Pk s Of 



I Q = X Q -X E - (70) 

Remarkably, the difference ~ X E depends only on £ and the average input 
state p®, not the details of the environment E or the exact choice of pure 
state input ensemble. 

The quantities \® and \ E are related to the classical information trans- 
fer to the output system Q and to the environment E, respectively. Thus, 
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Equation [71] relates the classical and quantum information properties of the 
channel. This relation has been used to analyze the privacy of quantum 
cryptographic channels |SIJ. We will use it here to give a relative entropy 



characterization of the the input state (ft that maximizes the coherent infor- 
mation of the channel. 

Let us suppose that p® is an input state that maximizes the coherent 
information I® . If we change the input state to 

pQ' = (1 - v )p Q + r]uj Q , (71) 

for some pure state u®, we produces some change AI® in the coherent in- 
formation. Viewing p@ as an ensemble of pure states, this change amounts 
to a modification of that ensemble; and such a modification leads to changes 
in the output ensembles for both system Q and system E. Thus, 

AI Q = A X Q - A X E . (72) 

We can apply Equation f|3] to bound both A\ Q and A\ E and obtain a lower 
bound for AI®\ 

> V (S (u;^') - X Q ) ~ V (S (ujV) - X E ) 
AI Q > r]lsl^\\p^)-S(oj E \\p E )-^). (73) 

Since we assume that I® is maximized for the input p®, then AI® < when 
we modify the input state. This must be true for every value of r) in the 
relation above. Whenever S (u^\\p^ is finite, we can conclude that 

S (u®\\pQ) - S (u E \\p E ) <I Q . (74) 

This is analogous to the maximum distance property for optimal signal en- 
sembles, except that it is the difference of two relative entropy distances that 
is bounded above by the maximum of I®. 



Let us write Equation [70] in terms of relative entropy, imagining that the 



input state (P is written in terms of an ensemble of pure states 

l Q = Y.vAs(p%P Q )-SU\\p E )). (75) 



Every input pure state \$k') * n ^ ne input ensemble with pt > will be in the 
support of p Q , and so Equation [74] holds. Therefore, we can conclude that 

I Q = S (p Q k \\ P Q ) ~ S (p E k \\p E ) (76) 
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for every such state in the ensemble. Furthermore, any pure state in the 
support of (ft is a member of some pure state ensemble for pft . 

This permits us to draw a remarkable conclusion. If pf 1 is the input state 
that maximizes the coherent information I® of the channel, then for any pure 
state u) Q in the support of p®, 

I Q = S (u Q \\p Q ) - S (u E \\p E ) . (77) 

This result is roughly analogous to the equal distance property for optimal 
signal ensembles. Together with Equation it provides a strong character- 
ization of the state that maximizes coherent information. 

The additivity problem for x* leads us to ask whether the maximum of the 
coherent information is additive when independent channels are combined. 
In fact, there are examples known where m&xI AB > m&xI A + max/' 8 ; in 
other words, entanglement between independent channels can increase the 
amount of coherent information that can be sent through them |2(J. The 



asymptotic behavior of coherent information and its precise connection to 
quantum channel capacities are questions yet to be resolved. 



7 Indeterminate length quantum coding 

In the previous section we saw that the relative entropy can be used to analyze 
the coherent information "capacity" of a quantum channel. Another issue in 
quantum information theory is quantum data compression |2"T|] , which seeks 
to represent quantum information using the fewest number of qubits. In this 
section we will see that the relative entropy describes the cost of suboptimal 
quantum data compression. 

One approach to classical data compression is to use variable length codes, 
in which the codewords are finite binary strings of various lengths The 
best-known examples are the Huffman codes. The Shannon entropy H(X) 
of a random variable X is a lower bound to the average codeword length in 
such codes, and for Huffman codes this average codeword length can be made 
arbitrarily close to H(X). Thus, a Huffman code optimizes the use of a com- 
munication resources (number of bits required) in classical communication 
without noise. 

There are analogous codes for the compression of quantum information. 
Since coherent superpositions of codewords must be allowed as codewords, 
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these are called indeterminate length quantum codes J^J. A quantum ana- 
logue to Huffman coding was recently described by Braunstein et al. An 



account of the theory of indeterminate length quantum codes, including the 
quantum Kraft inequality and the condensability condition (see below), will 
be presented in a forthcoming paper [p9| . Here we will outline a few results 



and demonstrate a connection to the relative entropy. 

The key idea in constructing an indeterminate length code is that the 
codewords themselves must carry their own length information. For a clas- 
sical variable length code, this requirement can be phrased in two ways. A 
uniquely decipherable code is one in which any string of N codewords can be 
correctly separated into its individual codewords, while a prefix-free code is 
one in which no codeword is an initial segment of another codeword. The 
lengths of the codewords in each case satisfy the Kraft-McMillan inequality: 

5^2-'* <1, (78) 
k 

where is the sum is over the codewords and 4 is the length of the fcth 
codeword. Every prefix-free code is uniquely decipherable, so the prefix- 
free property is a more restrictive property. Nevertheless, it turns out that 
any uniquely decipherable code can be replaced by a prefix-free code with 
the same codeword lengths. 

There are analogous conditions for indeterminate length quantum codes, 
but these properties must be phrased carefully because we allow coherent su- 
perpositions of codewords. For example, a classical prefix-free code is some- 
times called an "instantaneous" code, since as soon as a complete codeword 
arrives we can recognize it at once and decipher it immediately. However, if 
an "instantaneous" decoding procedure were to be attempted for a quantum 
prefix-free code, it would destroy coherences between codewords of differ- 
ent lengths. Quantum codes require that the entire string of codewords be 
deciphered together. 

The property of an indeterminate length quantum code that is analogous 
to unique decipherability is called condensability. We digress briefly to de- 
scribe the condensability condition. We focus on zero-extended forms (zef ) 
of our codewords. That is, we cosider that our codewords occupy an initial 
segment of a qubit register of fixed length n, with |0)'s following. (Clearly n 
must be chosen large enough to contain the longest codeword.) The set of 
all zef codewords spans a subspace of the Hilbert space of register states. We 
imagine that the output of a quantum information source has been mapped 
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unitarily to the zef codeword space of the register. Our challenge is to take 
N such registers and "pack" them together in a way that can exploit the fact 
that some of the codewords are shorter than others. 

If codeword states must carry their own length information, there must 
be a length observable A on the zef codeword space with the following two 
properties: 

• The eigenvalues of A are integers 1, . . . , n, where n is the length of the 
register. 

• If 



V'zef) * s an e ig ens tate of A with eigenvalue /, then it has the form 



(79) 



That is, the last n — I qubits in the register are in the state |0) for a 
zef codeword of length I. 

For register states not in the zef subspace, we can take A = oo. 

A code is condensable if the following condition holds: For any N, there 
is a unitary operator U (depending on N) that maps 



VYzef 



V^zef 



JVn qubits 



1*1^) 
Nn qubits 



with the property that, if the individual codewords are all length eigenstates, 
then U maps the codewords to a zef string of the Nn qubits — that is, one 
with |0)'s after the first L — l 1 + h l N qubits: 



+l-n 



l — JjVQijV + l-"7l 



■LqL+1-Nu^ 



The unitary operator U thus "packs" N codewords, given in their zef forms, 
into a "condensed" string that has all of the trailing |0)'s at the end. The 
unitary character of the packing protocol automatically yields an "unpack- 
ing" procedure given by U~ x . Thus, if the quantum code is condensable, a 
packed string of N codewords can be coherently sorted out into separated 
zef codewords. 

The quantum analogue of the Kraft-McMillan inequality states that, for 
any indeterminate length quantum code that is condensable, the length ob- 
servable A on the subspace of zef codewords must satisfy 



Tr2" A < 1, 



(80) 
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where we have restricted our trace to the zef subspace. We can construct a 
density operator uj (a positive operator of unit trace) on the zef subspace by 
letting K = Tr 2~ A < 1 and 

^ = 1 2 - A (81) 

The density operator uj is generally not the same as the actual density op- 
erator p of the zef codewords produced by the quantum information source. 
The average codeword length is 

I = TrpA 

,-A 



= — Tr plog ^2" 

= — Tr p log to — log K 
I = S (p) +S(p\\uj)- log K. (82) 

Since log K < and the relative entropy is positive definite, 

T>S{p). (83) 

The average codeword length must always be at least as great as the von 
Neuman entropy of the information source. 

Equality for Equation KBJ can be approached asymptotically using block 



coding and a quantum analogue of Huffman (or Shannon- Fano) coding. For 
special cases in which the eigenvalues of p are of the form 2 _m , then a code 
exists for which I = S(p), without the asymptotic limit. In either case, we 
say that a code satisfying I = S(p) is a length optimizing quantum code. 



Equation 82 tells us that, if we have a length optimizing code, K = 1 and 



p = uj = 2- A . (84) 

The condensed string of iV codewords has Nn qubits, but we can discard 
all but about Nl of them and still retain high fidelity. That is, I is the 
asymptotic number of qubits that must be used per codeword to represent 
the quantum information faithfully. 

Suppose that we have an indeterminate length quantum code that is de- 
signed for the wrong density operator. That is, our code is length optimizing 
for some other density operator u, but p ^ uj. Then (recalling that K — 1 



24 



for a length optimizing code, even if it is optimizing for the wrong density 
operator), 

l = S(p)+S(p\\uj). (85) 

S(p) tells us the number of qubits necessary to represent the quantum infor- 
mation if we used a length optimizing code for p. (As we have mentioned, 
such codes always exist in an asymptotic sense.) However, to achieve high 
fidelity in the situation where we have used a code designed for u>, we have 
to use at least I qubits per codeword, an additional cost of S (p\ \u>) qubits 
per codeword. 

This result gives us an interpretation of the relative entropy function 
<S(p||u;) in terms of the physical resources necessary to accomplish some 
task — in this case, the additional cost (in qubits) of representing the quantum 
information described by p using a coding scheme optimized for u. This is 
entirely analogous to the situation for classical codes and classical relative 



entropy Ml. A fuller development of this analysis will appear in 29 . 



8 Relative entropy of entanglement 

One recent application of relative entropy has been to quantify the entan- 
glement of a mixed quantum state of two systems |30[] . Suppose Alice and 
Bob share a joint quantum system AB in the state p AB . This state is said 
to be separable if it is a product state or else a probabilistic combination of 
product states: 

P AB = T,PkPk®Pk- ( 86 ) 
k 

Without loss of generality, we can if we wish take the elements in this en- 
semble of product states to be pure product states. Systems in separable 
states display statistical correlations having perfectly ordinary "classical" 
properties — that is, they do not violate any sort of Bell inequality. A sep- 
arable state of A and B could also be created from scratch by Alice and 
Bob using only local quantum operations (on A and B separately) and the 
exchange of classical information. 

States which are not separable are said to be entangled. These states 
cannot be made by local operations and classical communication; in other 
words, their creation requires the exchange of quantum information between 
Alice and Bob. The characterization of entangled states and their possible 
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transformations has been a central issue in much recent work on quantum 
information theory. 

A key question is the quantification of entanglement, that is, finding 
numerical measures of the entanglement of a quantum state p AB that have 
useful properties. If the joint system AB is in a pure state 
the subsystem states are 



V AB ), so that 



= Tr B 



i B = Tr 



q,AB\/^AB 
^AB\/yAB 



(87) 



then the entropy S(p A ) = S(p B ) can be used to measure the entanglement 
of A and B. This measure has many appealing properties. It is zero if and 
only if ty AB ^j is separable (and thus a product state). For an "EPR pair" of 
qubits — that is, a state of the general form 



ft 



o A o B > + 



i A i B 



(88) 



the susbsystem entropy S(p A ) = 1 bit. 

The subsystem entropy is also an asymptotic measure, both of the re- 
sources necessary to create the particular entangled pure state, and of the 
value of the state as a resource |31] . That is, for sufficiently large N, 



approximately NS(p A ) EPR pairs are required to create N copies of 
ty AB "j by local operations and classical communication; and 

approximately NS(p A ) EPR pairs can be created from N copies of 
ty AB ) by local operations and classical communication. 



For mixed entangled states p AB of the joint system AB, things are not 
so well-established. Several different measures of entanglement are known, 
including p2| 



the entanglement of formation E(p AB ), which is the minimum asymp- 
totic number of EPR pairs required to create p AB by local operations 
and classical communication; and 

the distillable entanglement D(p ), the maximum asymptotic number 
of EPR pairs that can be created from p AB by entanglement purification 
protocols involving local operations and classical communication. 
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Bennett et al. 133] further distinguish Di and D 2 , the distillable entangle- 



ments with respect to purification protocols that allow one-way and two-way 
classical communication, respectively. All of these measures reduce to the 
subsystem entropy S(p A ) if p AB is a pure entangled state. 

These entanglement measures are not all equal; furthermore, explicit for- 
mulas for their calculation are not known in most cases. This motivates us to 
consider alternate measures of entanglement with more tractable properties 
and which have useful relations to the asymptotic measures E, D\ and D2. 

A state p AB is entangled inasmuch as it is not a separable state, so it 
makes sense to adopt as a measure of entanglement a measure of the distance 
of p AB from the set T, AB of separable states of AB. Using relative entropy as 
our "distance" , we define the relative entropy of entanglement E r to be 



E '{e AB )=A s ('> AB \\<' AB )- («» 

The relative entropy of entanglement has several handy properties. First of 
all, it reduces to the subsystem entropy S(p A ) whenever p AB is a pure state. 
Second, suppose we write p AB as an ensemble of pure states ip AB ^j- Then 

Er(p AB )<Y.P*S{pf) (90) 

k 

where p A = Tr B 4> AB )\^ AB ■ It follows from this that E r < E for any state 



p AB . 



Even more importantly, the relative entropy of entanglement E r can be 
shown to be non- increasing on average under local operations by Alice and 
Bob together with classical communication between them. 

The quantum version of Sanov's theorem gives the relative entropy of 
entanglement an interpretation in terms of the statistical distinguishability 
of p AB and the "least distinguishable" separable state a AB . The relative 
entropy of entanglement is thus a useful and well-motivated measure of the 
entanglement of a state p AB of a joint system, both on its own terms and as 
a surrogate for less tractable asymptotic measures. 



9 Manipulating multiparticle entanglement 

The analysis in this section closely follows that of Linden et al. ||33|| , who 
provides a more detailed discussion of the main result here and its applica- 
tions. 
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A B C ) + 



1 A 1 B C )) . 



(91) 



Suppose Alice, Bob and Claire initially share three qubits in a "GHZ 

state" 

l* 4BC ) = 75 (I 

The mixed state p BC shared by Bob and Claire is, in fact, not entangled at 
all: 

+ 1 b 1 c\/ 1 b 1 c\ (Q2) 



BC 



4( 



B C\/ B Q C 



No local operations performed by Bob and Claire can produce an entangled 
state from this starting point. However, Alice can create entanglement for 
Bob and Claire. Alice measures her qubit in the basis { 



A ± 



}, where 
(93) 



It is easy to verify that the state of Bob and Claire's qubits after this mea- 
surement, depending on the measurement outcome, must be one of the two 
states 



A B 



± 



l A l B 



(94) 



both of which are equivalent (up to a local unitary transformation by ei- 
ther Bob or Claire) to an EPR pair. In other words, if Alice makes a local 
measurement on her qubit and then announces the result by classical com- 
munication, the GHZ triple can be converted into an EPR pair for Bob and 
Claire. 

When considering the manipulation of quantum entanglement shared 
among several parties, we must therefore bear in mind that the entangle- 
ment between subsystems can both increase and decrease, depending on the 
situation. This raises several questions: Under what circumstances can Alice 
increase Bob and Claire's entanglement? How much can she do so? Are 
there any costs involved in the process? 

To study these questions, we must give a more detailed account of "lo- 
cal operations and classical communication". It turns out that Alice, Bob 
and Claire can realize any local operation on their joint system ABC by a 
combination of the following: 

• Local unitary transformations on the subsystems A, B and C; 

• Adjoining to a subsystem additional local "ancilla" qubits in a standard 
state |0); 
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• Local ideal measurements on the (augmented) subsystems A, B and 
C; and 

• Discarding local ancilla qubits. 

Strictly speaking, though, we do not need to include the last item. That is, 
any protocol that involves discarding ancilla qubits can be replaced by one 
in which the ancillas are simply "set aside" — not used in future steps, but 
not actually gotten rid of. In a similar vein, we can imagine that the ancilla 
qubits required are already present in the subsystems A, B and C, so the 
second item in our list is redundant. We therefore need to consider only local 
unitary transformations and local ideal measurements. 

What does classical communication add to this? It is sufficient to suppose 
that Alice, Bob and Claire have complete information — that is, they are 
aware of all operations and the outcomes of all measurements performed by 
each of them, and thus know the global state of ABC at every stage. Any 
protocol that involved an incomplete sharing of information could be replaced 
by one with complete sharing, simply by ignoring some of the messages that 
are exchanged. 

Our local operations (local unitary transformations and local ideal mea- 
surements) always take an initial pure state to a final pure state. That is, 
if ABC starts in the joint state ty ABC ^ then the final state will be a pure 

state ^ ABC ^ that depends on the joint outcome k of all the measurements 
performed. Thus, ABC is always in a pure state known to all parties. 

It is instructive to consider the effect of local operations on the entropies of 
the various subsystems of ABC. Local unitary transformations leave S(p A ), 
S(p B ) and S(p c ) unchanged. But suppose that Alice makes an ideal mea- 
surement on her subsystem, obtaining outcome k with probability pk- The 
initial global state is $r ABC ^ and the final global state is ^ ABC "j, depending 
on k. For the initial subsystem states, we have that 

S ( P A ) = S ( P BC ) (95) 

since the overall state is a pure state. Similarly, the various final subsystem 
states satisfy 

S ( P i) = S ( P BC ) . (96) 
But an operation on A cannot change the average state of BC: 

P BC = T,P^ C - (97) 

k 
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Concavity of the entropy gives 

S{ P BC )>Y,P k S( P r) (98) 

k 

and therefore 

S{p A ) >*EPkS(pt). (99) 
k 

Concavity also tells us that S(p B ) > /flkSjPk), etc., and similar results 

k 

hold for local measurements performed by Bob or Claire. 

We now return to the question of how much Alice can increase the entan- 
glement shared by Bob and Claire. Let us measure the bipartite entanglement 
of the system BC (which may be in a mixed state) by the relative entropy of 
entanglement E r (p B ), and let a BC be the separable state of BC for which 

E r (p BC )=S(p BC \\a BC ). (100) 

No local unitary operation can change E r (p BC ); furthermore, no local mea- 
surement by Bob or Claire can increase E r (p BC ) on average. We need only 
consider an ideal measurement performed by Alice on system A. Once again 
we suppose that outcome k of this measurement occurs with probability pk, 
and once again Equation |§7| holds. Donald's identity tells us that 

^PkS ( P BO \W BC ) = (p bc \\p bc ) + s ( P BC \W BC ) . (ioi) 

k k 

But E r (p BC ) < S (p B c \ \o BC ^j for every k, leading to the following inequality: 
E^(pf C ) - E r (p BC ) < J2PkS (p B k C \\ P BC ) . (102) 

k k 

We recognize the left-hand side of this inequality x f° r the ensemble of post- 
measurement states of BC, which we can rewrite using the definition of \ in 
Equation [11]. This yields: 

Y.vkE r {p B k c ) - E r (p BC ) < s ( P BC ) -j: Pk s (>r) 

k k 

= S(p A )~J2P^{pt), (103) 

k 

since the overall state of ABC is pure at every stage. 
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To summarize, in our model (in which all measurements are ideal, all 
classical information is shared, and no classical or quantum information is 
ever discarded), the following principles hold: 

• The entropy of any subsystem A cannot be increased on average by 
any local operations. 

• The relative entropy of entanglement of two subsystems B and C can- 
not be increased on average by local operations on those two subsys- 
tems. 

• The relative entropy of entanglement of B and C can be increased by 
a measurement performed on a third subsystem A, but the average 
increase in E B C is no larger than the average decrease in the entropy 
of A. 

We say that a joint state ty ABC ^ can be transformed reversibly into 
fy ABC ) f° r sufficiently large N, N copies of ty ABC \ can be transformed 



with high probability (via local operations and classical communication) to 
approximately iV copies of \^ BC ), and vice versa. The qualifiers in this 
description are worth a comment or two. "High probability" reflects the fact 
that, since the local operations may involve measurements, the actual final 
state may depend on the exact measurement outcomes. "Approximately iV 
copies" means more than (1 — e)N copies, for some suitably small e deter- 
mined in advance. We denote this reversibility relation by 



yABC\ 



yABC\ 



Two states that are related in this way are essentially equivalent as "entan- 
glement resources". In the large N limit, they may be interconverted with 
arbitrarily little loss. 

Our results for entropy and relative entropy of entanglement allow us to 
place necessary conditions on the reversible manipulation of multiparticle 
entanglement. For example, if ^) ABC ^ ty ABC "j } then the two states must 

have exactly the same subsystem entropies. Suppose instead that S(p A ) < 
S(p A ). Then the transformation of iV copies of ty ABC "j into about N copies 

^> ABC \ would involve an increase in the entropy of subsystem A, which 



of 

cannot happen on average. 
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In a similar way, we can see that 
relative entropies of entanglement 



ty^BC^ anc j qfABC^ mugt k ave foe same 



/ 



br every pair of subsystems. Suppose 
instead that Ef^ < Ef§ ■ Then the transformation of N copies of ty^ BC "j 

into about N copies of \I f ^ BC '^ would require an increase in E BC . This can 
take place if a measurement is performed on A, but as we have seen this would 
necessarily involve a decrease in S(p A ). Therefore, reversible transformations 
of multiparticle entanglement must preserve both subsystem entropies and 
the entanglement (measured by E r ) of pairs of subsystems. 

As a simple example of this, suppose Alice, Bob and Claire share two 
GHZ states. Each subsystem has an entropy of 2.0 bits. This would also 
be the case if Alice, Bob and Claire shared three EPR pairs, one between 
each pair of participants. Does it follow that two GHZs can be transformed 
reversibly (in the sense described above) into three EPRs? 

No. If the three parties share two GHZ triples, then Bob and Claire are 
in a completely unentangled state, with Ef c = 0. But in the "three EPR" 
situation, the relative entropy of entanglement E® c is 1.0 bits, since they 
share an EPR pair. Thus, two GHZs cannot be reversibly transformed into 
three EPRs; indeed, 2N GHZs are inequivalent to 3N EPRs. 

Though we have phrased our results for three parties, they are obviously 
applicable to situations with four or more separated subsystems. In reversible 
manipulations of multiparticle entanglement, all subsystem entropies (includ- 
ing the entropies of clusters of subsystems) must remain constant, as well as 
the relative entropies of entanglement of all pairs of subsystems (or clusters 
of subsystems). 



10 Remarks 

The applications discussed here show the power and the versatility of relative 
entropy methods in attacking problems of quantum information theory. We 
have derived useful fundamental results in classical and quantum informa- 
tion transfer, quantum data compression, and the manipulation of quantum 
entanglement. In particular, Donald's identity proves to be an extremely 
useful tool for deriving important inequalities. 

One of the insights provided by quantum information theory is that the 
von Neumann entropy S(p) has an interpretation (actually several interpre- 
tations) as a measure of the resources necessary to perform an information 
task. We have seen that the relative entropy also supports such interpreta- 
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tions. We would especially like to draw attention to the results in Sections 3 
on the cost of communication and Section 7 on quantum data compression, 
which are presented here for the first time. 

We expect that relative entropy techniques will be central to further 
work in quantum information theory. In particular, we think that they show 
promise in resolving the many perplexing additivity problems that face the 
theory at present. Section 5, though not a very strong result in itself, may 
point the way along this road. 

The authors wish to acknowledge the invaluable help of many colleagues. 
T. Cover, M. Donald, M. Neilsen, M. Ruskai, A. Uhlmann and V. Vedral 
have given us indispensible guidance about the properties and meaning of 
the relative entropy function. Our work on optimal signal ensembles and the 
additivity problem was greatly assisted by conversations with C. Fuchs, A. 
Holevo, J. Smolin, and W. Wootters. Results described here on reversibil- 
ity for transformations of multiparticle entanglement were obtained in the 
course of joint work with N. Linden and S. Popescu. We would like to thank 
the organizers of the AMS special session on "Quantum Information and 
Computation" for a stimulating meeting and an opportunity to pull together 
several related ideas into the present paper. We hope it will serve as a spur for 
the further application of relative entropy methods to problems of quantum 
information theory. 
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